\chapter{Modules implementing standardized functionality}

\section{Reader}
\label{sec-reader}

The \commonlisp{} reader is a prime candidate for an
implementation-independent module.  We extracted the \sysname{} reader
to an independent repository under the name \eclector{}%
\footnote{https://github.com/s-expressionists/Eclector}, which was
subsequently greatly improved by Jan Moringen who is now also the
maintainer.

Aside from correctness, one of the main objectives of \eclector{} is
the ability for client code to customize its behavior.  This
customization is accomplished by the following technique:

\begin{itemize}
\item Standard entry points such as \texttt{read} are implemented as a
  simple \texttt{trampoline} to a generic function that takes an
  additional \emph{client} parameter.  The object passed to the
  generic function for this argument is the value of the special
  variable \texttt{*client*}.
\item The main generic function calls other generic functions for
  every aspect of the reader algorithm, each of which also has a
  \emph{client} parameter, so that each of these functions will
  receive the value of the \texttt{*client*} special variable as it
  was when the main entry point was invoked.
\item The default value of the \texttt{*client*} special variable is
  \texttt{nil}, and default methods that implement the reader
  algorithm do not specialize to the \texttt{client} parameter.
\item Client code will bind \texttt{*client*} to some standard object
  of its choice.
\item Clients that need to alter the default behavior of \eclector{}
  can write primary or auxiliary methods on the relevant generic
  functions for the aspect that needs to be customized.  These methods
  will specialize the \emph{client} parameter to the class of their
  chosen standard object.
\end{itemize}

Some examples of the aspects that can be customized are:

\begin{itemize}
\item The interpretation of tokens.  By default, tokens are
  interpreted according to section 2.3 of the \commonlisp{} standard,
  but this behavior is not always wanted by clients.
\item The creation of symbols.  By default, symbols are created
  according to section 2.3.4 of the \commonlisp{} standard, but there
  are many situations where this behavior is not appropriate.  As an
  example, client code may want to avoid that symbols are
  \emph{interned} in its packages.
\end{itemize}

Another main objective of \eclector{} is excellent condition
handling.  A \commonlisp{} reader is an excellent example of one of
the design goals of the \commonlisp{} condition system, i.e., where
low-level code detects a situation that it does not know how to
handle, because there are several possible choices.  The high-level
code that invoked the low-level code must make that decision.  Thus,
\eclector{} signals a condition and proposes one or more
\emph{restarts} that high-level code can invoke.  To make this
mechanism as useful as possible, \eclector{} signals specific
conditions for each such situations, thereby allowing client code to
take the appropriate action.

While an implementation-independent version of the \commonlisp{}
reader was the initial goal of the \sysname{} reader and then of
\eclector{}, we quickly saw other uses for this module.  As it turns
out, many situations require a function that behaves in a way similar
to the \commonlisp{} \texttt{read} function, but with some additional
twist to it.  For that reason, \eclector{} contains two additional
systems that take advantage of the design of of this module:

\begin{itemize}
\item A reader producing \emph{concrete syntax trees}.
\item A reader producing \emph{parse results}.
\end{itemize}

As explained in \refSec{sec-concrete-syntax-tree}, a \emph{concrete
  syntax tree} is a data structure that wraps an ordinary S-expression
in a standard object so that additional information can be provided
\emph{about} the S-expression.  In particular, the wrapper object can
contain information about \emph{source location} of the S-expression.

\eclector{} has a system that reads input and returns it as a concrete
syntax tree.  This features contains additional generic functions that
can be customized by client code:

\begin{itemize}
\item Client code can define the representation of a \emph{source
  position}.  By default, \eclector{} calls the function
  \texttt{file-position} and uses the value as a source position.
  Client code can customize this feature in many ways, including the
  use of a Gray stream with its own representation of position.
\item Client code can also define the representation of an
  \emph{interval}, i.e., two source positions, one for the start of an
  expression and another for the end of that expression.  By default,
  \eclector{} creates a \texttt{cons} cell containing the two
  individual positions.
\end{itemize}

The entry point for the standard reader and the entry point for
creating a concrete syntax tree both ignore syntactic features that
are not returned by a conforming \texttt{read} function.  Such
features include \emph{comments} (both line comments and block
comments) and expressions that were excluded based on \emph{reader
  conditionals}.  Sometimes, however, such syntactic features are
important, for example in a text editor that uses the reader to parse
the contents of a \commonlisp{} code buffer.  To address this
requirement, \eclector{} contains a system that returns \emph{parse
  results} which contain not only the code begin read, but also such
skipped material.

\section{Printer}
\label{sec-printer}

The printer module, named \incless{} has been extracted to a separate
repository%
\footnote{https://github.com/s-expressionists/Incless}.
It currently provides definitions of standard functions such as
\texttt{print}, \texttt{princ}, \texttt{print-object}, etc.  It does
not directly provide methods on \texttt{print-object} for different
classes of objects, because we want it to be possible for client code
to customize this behavior.  Instead, the default method on
\texttt{print-object} trampolines to a generic function named
\texttt{print-object-using-client} which has an additional
\emph{client} parameter.

The \texttt{format} function is implemented in a different library as
described in \refSec{sec-format}.

\section{Pretty printer}
\label{sec-pretty-printer}

Richard C Waters designed the pretty printer for
\commonlisp{} (\cite{Waters89xp:a}, \cite{Waters:1992:UNC:1039991.1039996}).
It might seem a natural choice to use it for \sysname{}, but we found
that the quality of the code was unacceptably low for the standards of
\sysname{}.  As a result Tarn W. Burton created a completely new
library for pretty printing, called \inravina{}%
\footnote{https://github.com/s-expressionists/Inravina}.  \inravina{}
is perfectly integrated with \incless{}, described in
\refSec{sec-printer}.

\section{Format}
\label{sec-format}

The functions \texttt{format} and \texttt{formatter} have been
extracted and are now in the library \invistra{}.
\footnote{https://github.com/s-expressionists/Invistra}

\section{Streams}
\label{sec-streams}

We are using the \cyclosis{} library%
\footnote{https://github.com/s-expressionists/Cyclosis} for the
\commonlisp{} functions and classes related to streams.  This code was
extracted (with permission) from the Mezzano%
\footnote{https://github.com/froggey/Mezzano}
operating system.

\cyclosis{} also provides an implementation of Gray streams.

\section{The \texttt{loop} macro} 
\label{sec-loop}

The \texttt{loop} macro is implemented by the extracted system
\khazern{} located in
\texttt{https://github.com/s-expressionists/Khazern}.

\section{High-level functions on lists}
\label{sec-constrictor}

This module is meant to be a complete implementation of portable
functions and macros in the Conses dictionary (chapter 14 in the
\commonlisp{} standard), except for the low-level functions such as
\texttt{cons}, \texttt{car}, \texttt{cdr}, \texttt{rplaca}, and
\texttt{rplacd} which can not be implemented portably.  For its
implementation, it uses the \texttt{loop} macro.  If any other
functionality is required, it will supply special implementations of
such functionality, so as to avoid dependencies on other modules.

We obtain high performance by identifying important special
cases such as the use of \texttt{:test} function \texttt{eq}, or
\texttt{equal}, or the use of a \texttt{:key} of \texttt{identity}.

We supply compiler macros so as to avoid runtime dispatch whenever a
special-case function can be determined by only looking at the call
site.  This ensures high performance for short lists, where argument
parsing would otherwise represent a significant fraction of the cost
of the call.

This module is fairly complete, and it includes macros such as
\texttt{push} and \texttt{pop} as well as compiler macros for
functions that take keyword arguments such as the mapping functions. 

The module also includes definitions of specific conditions that are
used by this module, together with condition reporters in English for
those conditions.  It also includes English-language documentation
strings for some of the functions. 

This module is now in a separate repository called \constrictor{}.%
\footnote{https://github.com/s-expressionists/Constrictor}

\section{Type system}
\label{sec-type-system}

Alex Wood implemented \texttt{ctype} which is a complete
implementation of the \commonlisp{} type system, and we use it in
\sysname{}.  In particular, it provides implementations of the
standard functions \texttt{typep} and \texttt{subtypep}.

\section{Sequence functions}
\label{sec-sequence-functions}

This module was written entirely by Marco Heisig.  It provides
high-performance implementations of the functions in the ``sequences''
chapter of the \commonlisp{} standard.  High performance is obtained
by identifying important special cases such as the use of
\texttt{:test} function \texttt{eq}, or \texttt{equal}, or the use of
a \texttt{:key} of \texttt{identity}.  These special cases are handled
by macros according to the technique described by our 2017 ELS paper
\cite{Durand:2017:ELS:Sequence}.

In addition to the technique described in that paper, Marco Heisig
decided to write the sequence functions as generic functions,
specialized to the type of the sequence argument.  Many
implementations have specialized versions of vectors, based on element
type, and a method specialized this way can often be significantly
faster than code that uses a generic \texttt{vector} type.  In order
to account for the different set of vector subclasses available in
different \commonlisp{} implementations, a macro
\texttt{replicate-for-each-vector-class} is used to generate a method
for each such subclass.  Client code can customize this module by
defining this macro according to its set of vector subclasses.

This module can be used as an ``extrinsic'' module, i.e., it can be
loaded into an existing \commonlisp{} implementation without
clobbering the native sequence functions of that implementation.  This
feature has been used to compare the performance of the functions in
this module to that of the native sequence functions of \sbcl{}, and
the result is very encouraging, in that many functions in this module
are as fast, or faster, than the native \sbcl{} equivalents.

The \texttt{sort} functions in this module use the technique described
in a paper by Kim and Kutzner \cite{10.1007/978-3-540-30140-0_63}.
This technique is based on merging.

\subsection{Future work}
\label{sec-sequence-functions-future-work}

Concerning the \emph{sorting functions} (i.e., \texttt{sort} and
\texttt{stable-sort}), there is a challenge.  The current
implementation uses a merging technique where no additional space is
required.  However, the current implementation is not as fast as
traditional merging with O(n) extra space.  So the question is whether
there is an intermediate solution where a small amount of additional
space is used whenever there is such space available, for example on
the stack.

This module has been extracted to a separate repository under the name
\consecution{}.%
\footnote{https://github.com/s-expressionists/Consecution}

\section{Hash tables}
\label{sec-hash-tables}

This module was written by Hayley Patton.  It has been extracted to a
separate repositor under the name \salmagundi{}%
\footnote{https://github.com/s-expressionists/Salmagundi}

\section{Condition system}
\label{sec-condition-system}

This module has been extracted to a separate repository named
\predicament{}.%
\footnote{https://github.com/robert-strandh/Predicament}
It is based on the portable condition system written by Michał "phoe"
Herda.%
\footnote{https://github.com/phoe/portable-condition-system}
With permission from the author, we modified the code so that it could
be used as an intrinsic condition system in addition to the use as an
extrinsic system it was originally designed for.

\section{Packages}
\label{sec-packages}

Code for packages is provided by the \parcl{} external library.%
\footnote{https://github.com/robert-strandh/\parcl{}}

\section{Array}
\label{sec-array}

The array module has been extracted with the name \regalia{}.%
\footnote{https://github.com/robert-strandh/\regalia{}}

\section{Structures}
\label{sec-structures}

Sylvia Harrington wrote the \sysname{} module that implements
structures.  This module is now extracted into a separate repository
under the name \anatomicl{}.%
\footnote{https://github.com/s-expressionists/Anatomicl}

\section{Arithmetic}

The code for arithmetic functions is contained in the directory
\texttt{Code/Arithmetic}.  At the moment, this code is very
preliminary.  There is not even a package definition for the package
used in the file \texttt{arithmetic.lisp}.

The code contains definitions of the functions \texttt{+}, \texttt{-},
\texttt{*}, and \texttt{/}.  These functions call binary versions also
defined in the same file.  There are also compiler macros to turn
calls to these functions with a known number of arguments into calls
to the binary versions.

The binary versions of the arithmetic functions dispatch according to
the exact type of the arguments to binary versions of the function
with a specific combination of types, but those specific functions
have not yet been written.  Some of those functions can be written
using portable code, but most of them will probably use
machine-specific low-level instructions. 

We have no specific advice for anyone who might be interested in
working on code for arithmetic functions.  

\section{\clos{}}
\label{sec-clos}

Most of the functionality defined by the book ``The Art of the
Metaobject Protocol'' is provided by the \clostrophilia{} library.%
\footnote{https://github.com/robert-strandh/Clostrophilia}

%%  LocalWords:  subclasses


% LocalWords:  Cyclosis Mezzano
